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[57] ABSTRACT 

Classifiers (110) and a comparator (112) perform an identi- 
fication method (400) to identify a class as one of a prede- 
termined set of classes. The identification method is based 
on determining the observation costs associated with the 
unidentified class. The identification method includes com- 
bining models representing the predetermined set of classes 
and the unidentified vectors representing the class. The 
predetermined class associated with the largest observation 
cost is identified as the class. Additionally a unique, low- 
complexity training method (300) includes creating the 
models which represent the predetermined set of classes. 

23 Claims, 3 Drawing Sheets 
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PATTERN CLASSIFIER WITH TRAINING optimum that a certain sound which has been emitted is 

SYSTEM AND METHODS OF OPERATION encapsulated in observation probabilities. 



THEREFOR 
FIELD OF THE INVENTION 



BRIEF DESCRIPTION OF THE DRAWINGS 



The invention is pointed out with particularity in the 

This invention relates in general to the field of classifiers, appended claims. However, a more complete understanding 

and, in particular, to polynomial classifiers. 0 f the present invention may be derived by referring to the 

DArvrnnnvm n r tuc iKn/rxmrw detailed description and claims when considered in connec- 

BACKGROUND OF THE INVENTION # . c r . . ... r . c . 

tion with the figures, wherein like reference numbers refer to 

Modern classifiers use techniques which are highly com- similar items throughout the figures, and: 

plex when high accuracy classification is needed. For FIG. 1 illustrates a simplified block diagram of a classifier 

example, a traditional neural network structure needing high training system in accordance with a preferred embodi- 

accuracy also needs a complex structure to perform classi- mcnt of ^ c prcscnt invention; 

fication because of dimculty in grouping different classes FIG. 2 illustrates a simplified block diagram of a classifier 

within the neural network structure. ^ accordance with a preferred embodiment of the present 

Additionally, in pattern recognition systems such as invention- 

speech recognition, when a spoken command is identified FIG. 3 is a flowchart illustrating a method for training a 

the spoken command * .dent.fied as one of a group of model for use Jn a dassifier ^ acc b ordance ^ a fefT s ed 

commands represented by a collection of models. Existing nn , t c 4 , , . . , 

Y . . , . , < <- 20 embodiment of the present invention; and 

speech recognition systems require large amounts of pro- r 

cessing and storage resources to identify a spoken command FIG - 4 15 a flowchart illustrating a method for identifying 

from a collection of models because the systems fail to use a class as a predetermined class in accordance with a 

a combination of observation cost and state information to preferred embodiment of the present invention, 

train low complexity models for identifying spoken com- M DETAILED DESCRIPTION OF THE 

mands * , . t , . t PREFERRED EMBODIMENTS 

Another problem with existing systems is that polynomial 

classifiers fail to use a combination of observation cost and The present invention includes, among other things, clas- 

state information when performing identification of classes sifiers and a comparator to perform an identification method 

(e.g., spoken commands, phoneme identification, digital 30 to identify a class as one of a predetermined set of classes. 

images, radio signatures, communication channels, etc.). The identification method is based on determining the 

Additionally, a problem with training systems for polyno- observation costs associated with the unidentified class. The 

mial classifiers is that existing systems do not train models identification method includes combining models represent- 

using a method which exploits state information within ing the predetermined set of classes and the unidentified 

training data. 35 vectors representing the class. The predetermined class 

Another problem with speech recognition systems is that associated with the largest observation cost is identified as 

such systems require accurate and low complexity methods the class. Additionally, a unique, low-complexity training 

for identifying an acoustic event. Typically, this is accom- method includes creating the models which represent the 

pUshed by separating speech into isolated phonetic units; for predetermined set of classes. 

example, the word "happy" is represented as a sequence of 40 Also, the present invention provides, a system and method 

four phonemes "H", "AE", "P", "IY". A popular technique for identifying classes from a collection of predetermined 

for determining phonemes from a spoken word is to use the classes using limited processing and storage resources. The 

Hidden Markov Model (HMM). HMM's classify by incor- present invention also provides a system and method which 

porating a finite state machine in a stochastic framework. can train a set of predetermined classes using limited pro- 

HMM's represent the order of the phonetic sounds by states. 45 cessing and storage resources. The present invention also 

In an HMM, the probability that a certain sound has been provides a system and method which combine observation 

emitted is encapsulated in observation probabilities. These cost and state information when identifying classes from a 

probabilities are typically modeled by a Gaussian Mixture set of predetermined classes and training models which 

Model (GMM). A problem with GMM's is that GMM's only represent the set of predetermined classes. The present 

provide limited accuracy for text independent speaker veri- 50 invention also provides a system and method for identifying 

fication. Another problem is that GMM's only provide a an acoustic event. Also, the present invention provides a 

local optimum. system and method for accurately modeling the probability 

Thus, what is needed are a system and method for that a certain sound emitted for text independent speaker 

identifying classes from a collection of predetermined verification is encapsulated in observation probabilities. The 

classes using limited processing and storage resources. What 55 present invention also provides a system and method for 

is also needed are a system and method which can train a set modeling a global optimum that a certain sound has been 

of predetermined classes using limited processing and stor- emitted is encapsulated in observation probabilities, 

age resources. What is also needed are a system and method A "class" is defined herein to mean a category (e.g., label) 

which combine observation cost and state information when provided to a representation of an item. For example, the 

identifying classes from a set of predetermined classes and 60 word "happy" is the class (e.g., label) associated with a 

training models which represent the set of predetermined feature vector representation of a speech sample of an 

classes. What is also needed are a system and method for individual speaking the word "happy". A "class" may also 

identifying an acoustic event. Also needed are a system and refer to a category (e.g., label) provided to a group of items 

method for accurately modeling the probability that a certain (e.g., group of words). A "class label" is defined herein to 

sound emitted for text independent speaker verification is 65 mean a label associated with a class. A "model structure" is 

encapsulated in observation probabilities. What is also defined herein to mean a vector. When the vector is a model 

needed are a system and method for modeling a global structure, the vector is a summation of a set of feature 
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vectors which represent the class or classes associated ciated with a set of unidentified feature vectors. Training 

therewith. A "mode!" is defined herein to mean a vector. processor 104 receives a series of states from each classifier 

When the vector is a class model, the vector has elements 110. As discussed above, each series of states is preferably 

which are weighted based primarily on the class associated associated with a set of unidentified feature vectors. In the 

therewith. A "feature vector" is defined herein to mean a 5 preferred embodiment, each unidentified feature vector is 

vector which represents the characteristics of an item. For associated with one state. Unidentified feature vectors rep- 

example, when a removed silence speech sample is rep re- resent a class which is to be identified from a set of 

sen ted as a set of cepstral coefficients, the cepstral coefE- predetermined classes. 

cients representing the speech sample are referred to as a Comparator 112 preferably compares the total observa- 

"feature vector". Feature vectors may be used to represent, 10 tion costs (TOCs) generated by classifiers 110. When com- 

among other things, spoken commands, phonemes, radio parator 112 receives TOCs associated with a set of uniden- 

signaturcs, communication channels, modulated signals, tified feature vectors, comparator 112 preferably compares 

biometrics, facial images, and fingerprints. the costs to determine the largest cost. Based on the largest 

FIG. 1 illustrates a simplified block diagram of a classifier cost, comparator 112 preferably associates a predetermined 

and training system in accordance with a preferred embodi- 15 class with the unidentified feature vectors representing a 

ment of the present invention. Classifier and training system class to identify the class from the set of predetermined 

(CTS) 100 illustrates a system capable of identifying a class classes. Comparator 112 preferably outputs the identified 

as at least one of a set of predetermined classes and training class via class output 109. 

a set of models to represent the set of predetermined classes. f\Q 2 illustrates a simplified block diagram of a classifier 

In the preferred embodiment of the present invention, CTS 20 m accordance with a preferred embodiment of the present 

100 includes feature memory 102, training processor 104, invention. Classifier 110 Dlustrates a classifier which accepts 

model memory 108, classifiers 110, and comparator 112. a of models via model input 103 which represent a 

CTS 100 may be implemented in hardware, software, or predetermined class, and feature vectors via feature input 

a combination of hardware and software. In the preferred 101 which represent an unidentified class. In the preferred 

embodiment, CTS 100 is implemented in software. 25 embodiment, feature vectors received via feature input 101 

Training processor 104 is preferably coupled to feature are received by expander 217 and output from expander 217 
memory 102 via feature vector input 101. Training processor via expanded output 201. Expanded output 201 is preferably 
104 is also coupled to model memory 108 via model received by model multipliers 205-208. Expander 217 pref- 
memory input 107. Additionally, training processor 104 is erably performs a polynomial expansion for feature vectors 
connected to classifiers 110 via classifier outputs 105. as described below. Preferably, each of model multipliers 
Preferably, training processor 104 retrieves feature vectors 205-208 performs a dot product using a model and an 
from feature memory 102 and receives feature vectors from expanded unidentified feature vector. For example, when 
an external system via feature vector input 101. In the classifier 110 contains models which represent the word 
preferred embodiment, feature vectors stored in feature "happy", each of the models operated on by model multi- 
memory 102 represent a set of predetermined classes. pliers 205-208 preferably represents a phoneme for the 
Preferably, training processor 104 determines models for the word, such as: "H" — model multiplier 205; "AE" — model 
set of predetermined classes based on feature vectors by multiplier 206; "P"— model multiplier 207; "IY"— model 
performing a training procedure discussed below. In the multiplier 208. Each of model multipliers 205-208 prefer- 
preferred embodiment, training processor 104 associates ably generates an observation cost based on the dot product 
feature vectors with predetermined states based on a training associated therewith. The observation cost associated with 
method performed by classifiers 110. When training for performing each dot product is preferably conveyed to 
models is complete, training processor 104 preferably stores selector 215 via multiplier output 211. 
models in model memory 108. Although the embodiment discussed above uses classifi- 

Classifiers 110 are preferably coupled to model memory 45 ers 110, other classifiers are suitable. For example, suitable 

108 via model input 103. Classifiers 110 receive feature classifiers may be found in U.S. patent application Ser..No. 

vectors from feature vector input 101. Classifiers 110 are 09/020953, entitled "MULTIRESOLUTIONAL CLASSI- 

also coupled to comparator 112 via classifier outputs 105. In FIER WITH TRAINING SYSTEM AND METHOD", 

the preferred embodiment, each of classifiers 110 receives a which is assigned a filing date Feb. 9, 1998, or U.S. patent 

model from model memory 108 and combines (e.g., per- 50 application Ser. No. 09/045361, entitled "TREE- 

forms a dot product) the model with unidentified feature STRUCTURED CLASSIFIER AND TRAINING APPA- 

vectors received via feature vector input 101. Preferably, the RATUSES AND METHODS OF OPERATION 

output from each of classifiers 110 is a total observation cost THEREFOR", which is assigned a filing date Mar. 1 8, 1998, 

and a series of states representing that total observation cost. the subject matter of which is incorporated by reference 

In the preferred embodiment, one observation cost is output S5 herein. 

for each set of unidentified feature vectors. Also, in another The observation cost associated with each model multi- 
embodiment, one series of states is output for each set of pliers 205-208 is preferably accumulated in memory 212 by 
unidentified feature vectors. selector 215 in accordance with the trellis diagram. Note that 

In the preferred embodiment, the plurality of classifiers after processing five feature vectors, selector 215 (e.g., four 

110 is equivalent to the number of predetermined classes, so state treIlis diagram left-to-right model) is in a steady state 

Preferably the number of classifiers 110 ranges between two in accordance with the trellis diagram, 

and several thousand, although other numbers of classifiers As discussed above, selector 215 accumulates the total 

110 are possible based on the application. cost associated with each model multiplier and therefore 

Comparator 112 is coupled to classifiers 110 via classifier each state. Selector 215 accumulates the total observation 

outputs 105. In the preferred embodiment, comparator 112 65 cost for each state in accordance with the trellis diagram, 

receives a total observation cost from each classifier 110. As Again, for example, the observation cost from the "state 1" 

discussed above, each observation cost is preferably asso- model multiplier (e.g., model multiplier 205) is equal to the 
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previous total cost (initially zero) added to the observation 
cost determined by the "state I" model multiplier. Since, 
according to the trellis diagram, state 1 only transitions to 
state 1 from state 1, the total observation cost for state 1 is 
the previous observation cost for state 1 added to the new 
observation cost for state 1. Per the trellis diagram, similar 
arguments hold for determining the total observation cost for 
each state in the predetermined class. In cases where the 
selector determines two total observation costs for a model 
multiplier (primarily because of different paths available in 
trellis diagram), the larger total observation cost is selected 
by selector 215 and this cost is determined to be the total 
observation cost for the associated state. 

Although this embodiment of the invention discusses a 
left-to-right HMM model with four states, other HMMs are 
suitable. Some examples of suitable HMMs are ergodic 
HMM and arbitrary state transition matrix HMM. 

In the preferred embodiment, selector 215 accumulates 
the observation cost associated with each of operations 
performed by each of model multipliers 205-208. 
Preferably, a total observation cost is accumulated for each 
of model multipliers 205-208. Preferably, selector 215 is 
coupled to memory 212, Selector 215 stores the total obser- 
vation cost for each model multipliers 205-208 in memory 
212. Preferably, each classifier 110 outputs the largest total 
observation cost stored in the associated memory 212 via 
classifier outputs 105. 

In the preferred embodiment, during a training method 
described below, a series of states associated with the path 
through the model multipliers generating the largest obser- 
vation costs, is accumulated in memory 212. Preferably, a 
state is accumulated in memory 212 for each of the uniden- 
tified feature vectors. Preferably, classifier 110 outputs the 
series of states via classifier output 105 and training pro- 
cessor 104 (FIG. 1) uses the series of states to train the 
models for each classifier 110. 

FIG. 3 is a flowchart illustrating a method for training a 
model for use in a classifier in accordance with a preferred 
embodiment of the present invention. In the preferred 
embodiment, method 300 is performed by a training pro- 
cessor to train (e.g., create) a set of models for use in an 
identification method (discussed below). Preferably, each of 
the set of models represents at least part of a class. 
Accordingly, each class represents, for example, spoken 45 
commands, phonemes, radio signatures, communication 
channels, modulated signals, biometrics, facial images, 
fingerprints, etc. 

In step 305, vectors for the predetermined set of classes 
are associated with predetermined states. In the preferred 
embodiment, feature vectors which represent a set of pre- 
determined classes are associated with slates which repre- 
sent the classes. For example, assume that one of the set of 
predetermined classes represents the word "happy". The 
word happy is preferably represented as a set of four 
phonemes; "H", "AE", "P", "IY". Each of the phonemes 
represents one of four stales for the word. Feature vectors 
are associated with phonemes and phonemes are associated 
with one of four states to represent the word. States may be 
arbitrarily associated with each of the phonemes. For 
example, state 1 (SI) is associated with "H", state 2 (S2) is 
associated with "AE", slate 3 (S3) is associated with "F\ 
and state 4 (S4) is associated with "IY". Initially, feature 
vectors which represent each state may be divided up 
equally. For example, say "happy" is represented by four 
hundred feature vectors. Each of the four phonemes is 
assigned one hundred feature vectors. Preferably, these 



"divisions" of feature vectors are used to determine an initial 
set of models in accordance with method 300. 

Further assume that another one of the set of predeter- 
mined classes represents the word "hat". The word hat is 
represented by a set of three phonemes; "H", "AE", "T". As 
discussed above, feature vectors are associated with pho- 
nemes which are also associated with states. Continuing 
with the above example, state 1 (SI) is assigned to "H", state 
2 (S2) is assigned to "AE", and state 3 (S3) is assigned to 
T. 

In a preferred embodiment, when a set of feature vectors 
represents a class and each class represents a word, feature 
vectors are determined from a speech sample. A set of 
feature vectors is determined from a series of overlapping 
windows of sampled speech (e.g., Hamming windows). 
Preferably, a feature vector is created for each Hamming 
window, wherein, each Hamming window represents a 
speech sample having the silence removed. 

In a preferred embodiment, an linear predictive (LP) 
analysis is performed and includes generating a predeter- 
mined number of coefficients for each Hamming window of 
the removed silence speech sample. Preferably the number 
of coefficients for the LP analysis is determined by the LP 
order. LP orders of 10, 12 and 16 are desirable however other 
LP orders may be used. A preferred embodiment uses an LP 
order of 12. In a preferred embodiment, step 305 generates 
12 coefficients for every Hamming window (e.g., every 10 
milliseconds, 30 milliseconds of removed silence speech). 
The result of step 305 may be viewed as a Zxl2 matrix, 
where Z is the number of rows and 12 (the LP order) is the 
number of columns. Z is dependent on the length of the 
removed silence speech sample, and may be on the order of 
several hundred or thousand rows. The Zxl2 matrix of step 
305 may also be viewed as Z sets of LP coefficients. In this 
example, there are 12 LP coefficients for every Hamming 
window of the removed silence speech. Each set of LP 
coefficients represents a feature vector. Additionally, cepstral 
coefficients are determined from the LP coefficients. 

In a preferred embodiment, step 305 includes performing 
a linear transform on the LP coefficients. Preferably, the 
linear transformation performed includes a cepstral analysis 
which separates unwanted from wanted information retain- 
ing information important to speech recognition. Performing 
the cepstral analysis is an optional part of step 305, however, 
for accurately identifying speech, cepstral analysis should be 
performed. Determining cepstral coefficients is a process 
known in the art. The result of performing the cepstral 
analysis may be viewed as a Zx24 matrix where 12 is the 
cepstral order. The cepstral order may be the same order as 
the LP order. The collection of feature vectors for the series 
of Hamming windows is comprised of either the sets of LP 
coefficients or cepstral coefficients associated therewith. 
In step 305, each "training" feature vector is processed by 
55 its associated classifier 110. Based on a dot product opera- 
tion for each training feature vector with the initial version 
of the predetermined models, a new series of states (e.g., sets 
of feature vectors for each model) is determined. As dis- 
cussed above, processing for each training feature vector to 
60 determine which vectors represent which models is in accor- 
dance with the trellis diagram representing predetermined 
states (e.g., set of models) for a classifier. In the preferred 
embodiment, a fixed number of iterations are performed to 
associate feature vectors with states. For example, five 
iterations may be performed and the resulting series of states 
is used to retrain the models representing a predetermined 
class. 
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In another embodiment, iterations are performed to asso- 
ciate feature vectors with states until a predetermined per- 
centage of states are unchanged from one iteration to the 
next iteration. For example, if a set of ten feature vectors 
represented two phonemes, and when performing two 5 
sequential iterations a less than 10% change occurred in the 
segmentation of feature vectors between phonemes, the 
segmentation of feature vectors would be complete. 

In step 310, the coefficients for the vectors representing 
each of the models are vector quantized. In a preferred 10 
embodiment, a vector quantization is performed on the 
cepstral coefficients of the feature vectors representing the 
models (e.g., states) for the class. In a preferred 
embodiment, one purpose of step 310 is to cluster the speech 
information for each model into a common size matrix 15 
representation. Step 310 is performed since step 305 may 
produce a diflterent number of feature vectors for each model 
because each phoneme may have a speech sample of a 
different time length. The vector quantization of step 310 
results in a predetermined number of feature vectors for each 20 
model. Codebook size input 315 is an input to step 310 and 
represents the number of feature vectors to be determined in 
step 310. 

Alternative to step 310, another embodiment of the 
present invention uses a fixed codebook (e.g., as used by a 
vocoder). When a fixed codebook size is used, each feature 
vector is quantized using the fixed codebook. This alterna- 
tive embodiment allows indices of predetermined feature 
vectors to be stored in feature memory instead of storing 
feature vectors. Indices are preferably represented as an 
integer and require less storage space than storing feature 
vectors representing each class. Indices are used as an index 
into the codebook where feature vectors are preferably 
stored. Storing indices instead of feature vectors may be 
chosen when limiting the amount of memory is preferred 
over processing performance. 

In step 320, a polynomial expansion for each of the 
vectors is performed. In a preferred embodiment, a high 
order polynomial expansion is performed on each vector ^ 
representing each model. Preferably, the high order polyno- 
mial expansion is a fourth order polynomial expansion; 
although, other polynomial orders are suitable. Preferably, 
the polynomial order for the high order polynomial expan- 
sion performed in step 320 is determined from polynomial 45 
order input 322. Desirably, the polynomial order input 322 
is in the range of 2 to 4. The results of step 320 are viewed 
as one matrix. When the cepstral order is 12 and cepstral 
coefficients are calculated, the high order polynomial 
expansion, when performed for each vector, produces a high 
order matrix of dimension codebook size input number of 
rows and 20,475 columns. 

In step 325, vectors are combined to determine an indi- 
vidual model structure for each of the set of models. In a 
preferred embodiment, an individual model structure is 55 
determined by summing the feature vectors of the high order 
matrix determined in step 320. In a preferred embodiment, 
the individual model structure is calculated for each model 
(e.g., state). The result of step 325 is a single vector (e.g., 
individual model structure) of same dimension as a single ^ 
vector of the high order matrix. In the embodiment having 
a high order matrix with the dimensions discussed in step 
320, the resultant individual model structure (e.g., vector) 
has 20,475 elements. 

In step 330, a total model structure is determined. In a 65 
preferred embodiment, a summation of each individual 
model structure is performed to determine the total model 



structure. Preferably, the summation is performed using the 
individual model structures determined in step 325. 

In step 335, a combined model structure for each of the set 
of models is produced. In a preferred embodiment, the 
combined model structure, i AtCombinr j, for a model is deter- 
mined by adding the total model structure (step 330) and a 
scaled version of an individual model structure associated 
therewith. For example, when a model, say model A, is 
trained for a phoneme (e.g., phoneme 1) and the class which 
includes model A is represented by 5 phonemes (e.g., 
phoneme 1, phoneme 2, . . . phoneme 5), the combined 
model structure representing model A is provided by equa- 
tion (eqn.) 1, 

r A ^Kn^r tota{ +W oll ltf x )-2yr A . mM (eqn. 1) 

wherein, 

x A cornb{rted is the combined model structure for model A, 
* total is tne total model structure determined in step 330 for 
the combination of all phonemes being trained (e.g., pho- 
neme 1, phoneme 2, . . . , phoneme 5), 
25 N a// is a summation of the number of feature vectors 
representing each phoneme (e.g., the number of feature 
vectors for phoneme 1, phoneme 2, . . . , phoneme 5), 
Nj is the number of feature vectors representing phoneme 

1, 

r A/nodel is the individual model structure for model A 
determined in step 325. Preferably, scaling factor input 340 
represents a scaling factor term (e.g., ((N a// /N 1 )-2)) in eqn. 
1. 

In step 345, the combined model structure is mapped to a 
matrix for each of the models. In a preferred embodiment, a 
matrix representing a model, and therefore at least part of a 
predetermined class, is titled a model matrix. The model 
matrix for the A th model is represented as, R A . Preferably, 
the method for mapping a combined model structure, 
*A,combined> 10 a model matrix, R A , is best illustrated as an 
example. Consider, for example, the case of a two element 
combined model structure, r A combined in eqn. 2, 



30 



35 



50 



(eqn. 2) 



The second order expansion (i.e., high order polynomial 
expansion) for eqn. 2 is provided in eqn. 3, 



I (eqn. 3) 

4 



A square model matrix having row and column dimen- 
sions is determined by eqn. 4, 
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*1*2 



where p(x)' represents the transpose of vector p(x). 

Therefore, in a preferred embodiment, the mapping of the 
combined model structure to the model matrix is performed 
by copying the second order elements (high order polyno- 
mial expansion) found in eqn. 3 to the corresponding matrix 
element in eqn. 4. Again, for example, the x a x 2 element of 
eqn. 3 would map to the matrix elements having indices 
R A (3,2) and R A (2,3). The mapping approach described in 
step 345 can be extended to higher order systems. 

In step 350, the matrix for each model is decomposed. In 
a preferred embodiment, a model matrix for the A** model 
(e.g., state), is decomposed using Cholesky decomposition. 
For example, the Cholesky decomposition for R A is repre- 
sented in equation form in eqn. 5, 



L/U-R, 



(eqn. 5) 



where L A l is the transpose of matrix L A and both matrices 
are determined using Cholesky decomposition. 

In step 355, each of the set of models is created. In a 
preferred embodiment, a model, w A , is determined using 
back substitution. For example, eqn. 6 can be solved for w^ 
(e.g., model A), 



where L A , L A , w A , N o//} and N lf are each described above. 
Preferably, & A is a low order model structure for the A th 
model. In a preferred embodiment, a A is determined using a 
method similar to the method for determining the individual 
model structure (step 325). The polynomial order for the low 
order model structure is preferably half the polynomial order 
for the individual model structure. Since the low order 
model structure elements are also elements of the individual 
model structure (step 325), the low order model structure 
may be determined directly from the individual model 
structure. 

In an alternative embodiment, columns of R A and the 
corresponding element of w A may be eliminated. This 
operation reduces the number of classifier parameters yield- 
ing a smaller implementation. 

In step 360, models are grouped to determine a series of 
states associated with each predetermined class. In the 
preferred embodiment, models which represent a predeter- 



FIG. 4 is a flowchart illustrating a method for identifying 
(eqn. 4) a class as a predetermined class in accordance with a 
preferred embodiment of the present invention. In the pre- 
ferred embodiment, method 400 is performed by a combi- 
nation of classifiers and a comparator. Preferably, method 
400 describes a method for identifying a class as at least one 
of a set of predetermined classes. Preferably, each of the set 
of predetermined classes is represented by at least one 
model. A suitable method for training (e.g., creating) models 
is described in method 300 (FIG. 3). 

In step 405, the cost for each state for each classifier is 
initialized. In the preferred embodiment, each classifier 
includes a set of model multipliers. Preferably, each model 
multiplier represents one state for each of the series of states 
representing the class. For example, when a classifier 
includes models which represent the word "happy", the 
classifier uses four models (e.g., states) to represent the 
phonemes which in turn represent the word. In other words, 
one model multiplier includes the model for the phoneme 
"H", another model multiplier includes the model for pho- 
neme "AE", and so forth for each phoneme. Since the 
selector accepts the observation cost generated by each 
model multiplier, the selector preferably initializes the cost 
associated with each model multiplier to zero. 

In step 410, vectors representing an unidentified class are 
determined. In the preferred embodiment, feature vectors 
may be determined similar to the method for determining 
feature vectors in step 305 (FIG. 3). 

In step 415, a polynomial expansion is performed for the 
coefficients for each of the vectors. In the preferred 
embodiment, a polynomial expansion is performed for the 
feature vectors determined in step 410. Preferably, the 
polynomial expansion performed in step 415 is similar to the 
polynomial expansion performed in step 320 (FIG. 3). 

In step 420, selected ones of the set of models are 
multiplied with a vector to determine a cost. In the preferred 
embodiment, models for selected ones of the series of states 
representing a predetermined class are multiplied with an 
unidentified feature vector. Initially, a first model (e.g., first 
40 state) representing a predetermined class is multiplied by an 
unidentified feature vector. The selector preferably enables 
model multipliers to perform subsequent multiplication 
steps with subsequent unidentified feature vectors based on 
a trellis diagram. The observation cost for each "enabled" 
45 model multiplier is accumulated. For example, when a 
classifier includes models for phonemes which represent the 
word "happy", a first unidentified feature vector is multi- 
plied with one of the models (e.g., the "state 1" model 
representing "H" in the word "happy" is activated by the 



(eqn. 6) 
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mined class are grouped. Grouping the models effectively 50 selector, the selector enables model multipliers based on a 



determines a series of states which represent the predeter- 
mined class. For example, for the word "happy", a model, 
and therefore a state, for each of the phonemes are grouped 
together. As described above, when each phoneme is repre- 
sented by a slate (e.g., SI, S2, S3, S4), the series of states 
identifies the associated word (e.g., happy). Therefore, dur- 
ing an identification method for a class, such as the method 
discussed below, a series of states determined for an uni- 
dentified class may be used to identify the class as one of a 
set of predetermined classes (e.g., the word "happy"). 

Additionally, in step 360, models representing the set of 
predetermined classes may be stored. In a preferred 
embodiment, a class model for a class is stored in a memory. 
Among other things, the memory may be a random access 
memory (RAM), a database, magnetic storage media such as 
disk or tape, read-only memory (ROM), and other types of 
suitable data storage. 



65 



trellis diagram for the predetermined class). The observation 
cost for the multiplication step is preferably accumulated by 
the selector for the associated state. 

Again, for example, assume that the models for happy are 
represented by the phonemes "H", "AE", "P", "IY", respec- 
tively. Further assume that the unidentified feature vectors 
represent an "unidentified" class for the word "hat" (e.g., 
"H", "AE", "T"). When the first unidentified feature vector 
is multiplied by an "H" model for happy, the multiplication 
step generates an observation cost. The selector accumulates 
the observation cost for the respective states based on the 
trellis diagram for the predetermined class. In this example, 
the model multipliers associated with the "H" and the "AE" 
phonemes are the next likely models to produce the largest 
observation cost because operations performed by the clas- 
sifiers maximize the observation costs for each unidentified 
vector. 



03/11/2004, EAST Version: 1.4.1 



6,131,1 

11 

Id step 430, the cost for each unidentified vector is 
accumulated. In the preferred embodiment, the selector 
accumulates the observation cost for each unidentified fea- 
ture for each state represented in each classifier. 

In step 435, a check is performed to determined when 5 
additional multiplication steps need to be performed. In the 
preferred embodiment, when additional unidentified feature 
vectors need to be processed, steps 420-435 are performed. 
When no additional unidentified feature vectors need to be 
processed, step 440 is performed. 

In step 440, a class is identified. In the preferred 
embodiment, an unidentified class is identified based on the 
total cost accumulated in step 430. Preferably, the predeter- 
mined class associated with the classifier which produces the 
largest total cost identifies the unidentified class. 

It should also be noted that to those skilled in the art, the 15 
training and identification methods can be extended to use 
transition penalties. 

Thus, among other things, what has been shown are a 
system and method for identifying classes from a collection 
of predetermined classes using limited processing and slor- 20 
age resources. What has also been shown are a system and 
method which can train a set of predetermined classes using 
limited processing and storage resources. What has also 
been shown are a system and method which combine 
observation cost and state information when identifying 25 
classes from a set of predetermined classes and training 
models which represent the set of predetermined classes. 
Also shown are a system and method for identifying an 
acoustic event. Also shown are a system and method for 
accurately modeling the probability that a certain sound 3Q 
emitted for text independent speaker verification is encap- 
sulated in observation probabilities. What has also been 
shown are a system and method for modeling a global 
optimum that a certain sound has been emitted is encapsu- 
lated in observation probabilities. 

The foregoing description of the specific embodiments 35 
wiD so fully reveal the general nature of the invention that 
others can, by applying current knowledge, readily modify 
and/or adapt for various applications such specific embodi- 
ments without departing from the generic concept, and 
therefore such adaptations and modifications should and are 40 
intended to be comprehended within the meaning and range 
of equivalents of the disclosed embodiments. 

It is to be understood that the phraseology or terminology 
employed herein is for the purpose of description and not of 
limitation. Accordingly, the invention is intended to embrace 45 
all such alternatives, modifications, equivalents and varia- 
tions as fall within the spirit and broad scope of the 
appended claims. 

What is claimed is: 

1. A method for training a set of models by classifier and 5Q 
training system, each of the set of models representing at 
least part of a predetermined speech recognition class, the 
predetermined class being one of a set of predetermined 
classes, the method comprising the steps of: 

associating vectors for the set of predetermined classes ss 
with at least one of a group of predetermined states, 
each of the group of predetermined slates representing 
at least one of the set of models; 
combining the vectors to determine an individual model 

structure for each of the set of models; ^ 
producing a combined model structure for each of the set 
of models based on the individual model structure; and 
creating each of the set of models based on the combined 

model structure and the vectors, 
the method identifying a class as at least one of the set of 65 
predetermined classes, wherein the method further 
comprises the steps of: 
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determining unidentified vectors which represent the 
class; 

multiplying selected ones of the set of models with the 
unidentified vectors to determine a cost associated 
with each of the unidentified vectors; 

accumulating the cost for each of the unidentified 
vectors to determine a total cost for the unidentified 
vectors; and 

identifying the speech recognition class by the classifier 
and training system as at least one of the set of 
predetermined classes based on the total cost. 

2. A method as claimed in claim 1, further comprising the 
steps of: 

vector quantizing coefficients for the vectors for each of 

the set of models; and 
performing a polynomial expansion of the coefficients for 

each of the vectors before performing the combining 

step. 

3. A method as claimed in claim 1, further comprising the 
step of determining a total model structure based on the 
individual model structure for each of the set of models, 

and wherein the combined model structure for each of the 
set of models is further based on the total model 
structure. 

4. A method as claimed in claim 1, further comprising the 
steps of: 

mapping the combined model structure to a matrix for 

each of the set of models; and 
decomposing the matrix for each of the set of models, 
and wherein the creating step includes determining each 

of the set of models based on the matrix and the vectors. 

5. A method as claimed in claim 1, further including the 
step of performing a polynomial expansion of coefficients 
for each of the unidentified vectors before performing the 
multiplying step. 

6. A method as claimed in claim 1, wherein each of the 
group of predetermined states represents a phoneme. 

7. Amethod as claimed in claim 1, wherein each of the set 
of predetermined classes represents a spoken word. 

8. Amethod as claimed in claim 1, wherein each of the set 
of predetermined classes represents a digital image. 

9. Amethod as claimed in claim 1, wherein each of the set 
of predetermined classes represents a radio signature. 

10. A method as claimed in claim 1, wherein each of the 
set of predetermined classes represents a speaker. 

11. Amethod for identifying a speech recognition class by 
classifier and training system as at least one of a set of 
predetermined classes, each of the set of predetermined 
classes being represented by at least one of a set of models, 
the method comprising the steps of: 

determining unidentified vectors which represent the 
class; 

multiplying selected ones of the set of models with the 

unidentified vectors to determine a cost associated with 

each of the unidentified vectors; 
accumulating the cost for each of the unidentified vectors 

to determine a total cost for the unidentified vectors; 

and 

identifying the speech recognition class by the classifier 
and training system as at least one of the set of 
predetermined classes based on the total cost. 

12. A method as claimed in claim 11, further including the 
step of performing a polynomial expansion of coefficients 
for each of the unidentified vectors before performing the 
multiplying step. 
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13. A method as claimed in claim U, for training the set 
of models, each of the set of models representing at least part 
of a predetermined class, the predetermined class being one 
of the set of predetermined classes, the method comprising 
the steps of: 

associating vectors for the set of predetermined classes 
with at least one of a group of predetermined states, 
each of the group of predetermined states representing 
at least one of the set of models; 

combining the vectors to determine an individual model 
structure for each of the set of models; 

producing a combined model structure for each of the set 
of models based on the individual model structure; and 

creating each of the set of models based on the combined 
model structure and the vectors. 

14. A method as claimed in claim 13, further comprising 
the steps of: 

vector quantizing coefficients for the vectors for each of 

the set of models; and 
performing a polynomial expansion of the coefficients for 

each of the vectors before performing the combining 

step. 

15. A method as claimed in claim 13, further comprising 
the step of determining a total model structure based on the 
individual model structure for each of the set of models, 

and wherein the combined model structure for each of the 
set of models is further based on the total model 
structure. 

16. A method as claimed in claim 13, further comprising 
the steps of: 

mapping the combined model structure to a matrix for 

each of the set of models; and 
decomposing the matrix for each of the set of models, 
and wherein the creating step includes determining each 

of the set of models based on the matrix and the vectors. 

17. A classifier and training system for identifying a 
speech recognition class as at least one of a set of prede- 
termined classes, the class being represented by a plurality 
of unidentified vectors, each of the set of predetermined 
classes being represented by at least one of a set of prede- 
termined models, each of the set of predetermined models 
representing a predetermined state, the predetermined state 
being one of a group of predetermined states, the system 
comprising: 

a plurality of classifiers for receiving the set of predeter- 
mined models and the plurality of unidentified vectors 
and generating costs, 

wherein each of the plurality of classifiers is further 
comprised of: 

a set of model multipliers for receiving models and 
unidentified vectors to generate the costs; 

a selector for receiving the costs, enabling selected ones 
of the set of model multipliers based on the costs, 
and storing the costs in a memory; and 

a comparator coupled to each of the plurality of clas- 
sifiers for comparing the costs generated from each 
of the plurality of classifiers and identifying the 
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speech recognition class by the classifier and training 
system based on the costs. 

18. A method for identifying a speech recognition class by 
classifier and training system as at least one of a set of 

5 predetermined classes, the method comprising the steps of: 
representing each of the set of predetermined classes by at 

least one of a set of models; 
training the set of models; 

determining unidentified vectors which represent the" 
class; 

multiplying selected ones of the set of models with the 
unidentified vectors to determine a cost associated with 
each of the unidentified vectors; 
15 accumulating the cost for each of the unidentified vectors 
to determine a total cost for the unidentified vectors; 
and 

identifying the speech recognition class by the classifier 
and training system as at least one of the set of 
20 predetermined classes based on the total cost. 

19. A method as claimed in claim 18, further including the 
step of performing a polynomial expansion of coefficients 
for each of the unidentified vectors before performing the 
multiplying step. 

25 20. A method as claimed in claim 18, wherein the training 
step further includes the steps of: 

associating vectors for the set of predetermined classes 
with at least one of a group of predetermined states, 
each of the group of predetermined states representing 
30 at least one of the set of models; 

combining the vectors to determine an individual model 

structure for each of the set of models; 
producing a combined model structure for each of the set 
of models based on the individual model structure; and 
35 creating each of the set of models based on the combined 
model structure and the vectors. 
21. A method as claimed in claim 20, further comprising 
the steps of: 

40 vector quantizing coefficients for the vectors for each of 
the set of models; and 
performing a polynomial expansion of the coefficients for 
each of the vectors before performing the combining 
step. 

45 22. A method as claimed in claim 20, further comprising 
the step of determining a total model structure based on the 
individual model structure for each of the set of models, 
and wherein the combined model structure for each of the 
set of models is further based on the total model 
50 structure. 

23. A method as claimed in claim 20, further comprising 
the steps of: 

mapping the combined model structure to a matrix for 

each of the set of models; and 
55 decomposing the matrix for each of the set of models, 
and wherein the creating step includes determining each 

of the set of models based on the matrix and the vectors. 
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